FENS 07 



Cluster Expansion Method for Evolving Weighted Networks Having Vector-like Nodes 

M. AusloosS 

GRAPES, SUPRATECS, U.Lg, B5a Sart-Tilman, B-4000 Liege, Belgium, Euroland 

M. GligoiQ 

National College Roman Voda Roman-5550, Neamt, Romania 
(Dated: 27/02/2008) 

The Cluster Variation Method known in statistical mechanics and condensed matter is revived 
for weighted bipartite networks. The decomposition of a Hamiltonian through a finite number of 
components, whence serving to define variable clusters, is recalled. As an illustration the network 
built from data representing correlations between (4) macro-economic features, i.e. the so called 
vector components, of 15 EU countries, as (function) nodes, is discussed. We show that statistical 
physics principles, like the maximum entropy criterion points to clusters, here in a (4) variable phase 
space: Gross Domestic Product (GDP), Final Consumption Expenditure (FCE), Gross Capital 
Formation (GCF) and Net Exports (NEX). It is observed that the maximum entropy corresponds to 
a cluster which does not explicitly include the GDP but only the other (3) "axes", i.e. consumption, 
investment and trade components. On the other hand, the minimal entropy clustering scheme is 
obtained from a coupling necessarily including GDP and FCE. The results confirm intuitive economic 
theory and practice expectations at least as regards geographical connexions. The technique can of 
course be applied to many other cases in the physics of socio-economy networks. 

PACS numbers: 89.75.Fb, 89.65.Gh, 89.75.Hc, 87.23.Ge 



I. INTRODUCTION 



In physics one is often interested about models with a finite number N of degrees of freedom, hereby denoted 
by s = (si, S2, ■ ■ ■ , sat), taking sometimes discrete values, in contrast to continuous ones, as in field theories. For 
instance, the variables Sj could take values [0 or 1] (binary variables), [—1,-1-1] (Ising spins), or [1,2, ...q] (Potts 
variables). Network nodes and/or links can possess such degrees of freedom which indicate the role of a few variables 
for characterizing or tying nodes together; these variables serve, e.g., to be exemplifying clusters, communities, ... in 
the network. Several network characterization techniques based on related discrete value algebra exist in the literature 

mm- 

Recall that statistical mechanical models are defined through an energy function, like a Hamiltonian, Ti.—H.(s); the 
corresponding probability distribution at thermal equilibrium is the Boltzmann distribution: 



p(s) = -expl-H(s)} (1) 

where the inverse temperature /3 = (fc^T) -1 1 has been absorbed into the Hamiltonian as often conventionally 
done; 

Z = exp[-T\ = ]T exp[-H(s)] (2) 

s 

is called the partition function and T the free energy. The Hamiltonian is typically a sum of terms, each involving a 
small number of variables. 

A technique which has been of interest a long time ago in condensed matter is the cluster variation approximation 
method 21 [5] . The free energy or the Hamiltonian is expanded through a series in the variables by a systematic 
projection in order to define the interaction energy at each successive cluster size level. We re-introduce the technique 
here, suggesting its power for discussing network properties. We take as an example and for illustration a finite size 
network, one made of nodes being EU countries characterized by their most usual (macroeconomic) features. The 
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TABLE I: The minimal path length (MPL) distances to the "average country". Indicator: FCE (= S2). The moving time 
window size is T= 5 years for the data [llj taken from 1994 to 2003. 





AUT 


BEL 


DEU 


DNK 


ESP 


FIN 


FRA 


GBR 


GRC 


IRL 


ITA 


LUX 


NLD 


PRT 


SWE 


94-98 


0.88 


0.65 


0.85 


0.88 


0.65 


0.37 


0.65 


0.65 


0.65 


0.65 


0.37 


0.65 


0.65 


0.65 


0.65 


95-99 


0.79 


0.79 


0.79 


0.81 


0.79 


0.41 


0.79 


0.79 


0.93 


0.79 


0.53 


0.59 


0.79 


0.79 


0.79 


96-00 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


0.26 


1.02 


1.02 


1.02 


1.02 


97-01 


0.51 


0.51 


0.51 


0.65 


0.51 


0.73 


0.88 


0.51 


0.65 


0.51 


0.33 


0.88 


0.51 


0.51 


0.51 


98-02 


0.52 


0.52 


0.52 


0.96 


0.52 


0.66 


0.95 


0.65 


0.96 


0.52 


0.35 


1.19 


0.52 


0.52 


0.52 


99-03 


0.45 


0.42 


0.45 


1.00 


0.45 


0.53 


0.40 


0.46 


1.00 


0.42 


0.30 


0.92 


0.45 
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fluctuation correlations between these serve to define the so called adjacency matrix, whence the weights of the links 
of the network. 

The technique appears to be very general and could be useful to sort out features not observed otherwise. 

II. THEORETICAL CONSIDERATIONS 

A useful representation is given by the factor graph. A factor graph [6 is a bipartite graph made of variable nodes 
one for each variable, and function nodes a, b, one for each term of the Hamiltonian. A link joins a variable 
node i and a function node a if and only if i £ a, that is the variable Si appears in H a , the term of the Hamiltonian 
associated to a. The Hamiltonian can then be written as 

N 

n = Y,n a (s a ) (3) 

a 

with s a = {s,i,i £ a} This sort of writing through the decomposition of a Hamiltonian into terms describing 

clusters of different (increasing) sizes has been shown to be of great interest, see [7] when, e.g. applying techniques 
like the renormalization group. 

In combinatorial optimization problems, the Hamiltonian plays the role of a cost function and one is often interested 
in the low temperature limit T — > 0, where only minimal energy states (ground states) have a nonvanishing probability. 

Probabilistic graphical models are usually defined in a slightly different way[8j. E.g., in the case of Markov random 
fields, also called Markov networks, the joint distribution over all variables is given by 

p(s)=inus«) (4) 

Z a 

where ip a is called the potential, and 

s a 

Of course, a statistical mechanical model described by the Hamiltonian (3), corresponds to a probabilistic graphical 
model with potentials ip a = exp(—H a ), and corresponding Z = Z and T — T . 

Next we define a cluster a as a subset of the factor graph such that if a function node belongs to a, then all the 
variable nodes s a also belong to a ; notice that the converse needs not to be true, otherwise the only legitimate clusters 
would be the connected components of the factor graph. Given a cluster we can write its probability distribution, 
defined as the ratio between the number of realized connections and the number of all possible connections, as 



Pa(s a ) = ^2p(s). 



(6) 
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TABLE II: The correlation matrix of EU-15 country movements inside the hierarchy. Indicator: FCE. The moving time window 
size is 5 years for data taken from 1994 to 2003. 
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and its entropy 

S a (s a )=-2_,p(s)-hxp(s) (7) 
III. ILLUSTRATION 

As a short illustration, consider the function nodes to be countries and the variables to be macroeconomic indicators 
M, i-e. 

1. Consider the nodes to be the first (in time) 15 EU countries. Let the country names be abbreviated according 
to The Roots Web Surname List (RSL) [TU] which uses 3 letters standardized abbreviations. 

2. Suppose that we are interested in a vector describing each country (Hamiltonian or) "thermodynamic state" 
with 4 components, i.e. s\ = Gross Domestic Product (GDP), S2 = Final Consumption Expenditure (FCE), S3 
= Gross Capital Formation (GCF) and S4 = Net Exports (NEX). The World Bank database [H] is here used 
as data source. Let the data be taken from 1994 to 2004 for GDP and from 1994 to 2003 for FCE, GCF and 
NEX, respectively. 

The yearly fluctuations of these four variables are easily calculated and their auto- and cross-correlation matrices 
easily obtained; see e.g. a discussion for GDP in |T2j [13] and more detail elsewhere [14]. Essentially, the correlations 
can calculated for a time window of given size moving along the time axis; these are used for getting the statistical 
distances among countries, e.g. A and B, for various time window sizes T at various times t, where t is the final point 
of the interval, i.e. 
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FIG. 1: The factor graph associated to the first 15 EU country connections, according to the strongest correlations in the 
Gross Domestic Product (GDP) and Final Consumption Expenditure (FCE) 



d s (A, B\ tiT) = yj2(l-C (t , T )(A,B)) (8) 

where 

L(t,T){A,B) = - . (9) 

yf((A*) (t , T) - (A)l ttT) )((B*) (t , T) - (B)f t T) ) 

The brackets (...) denotes the expectation value of the "A, B time series", - here in the interval (t — T, t). 

These distances are thus mapped onto ultrametrical distances, as in the classical Minimum Spaning Tree (MST) 
method. By calculating the statistical distances with respect to the average value of the index (seen here as for an 
" average" country) , we get a country hierarchy that proves to be changing from a time interval to another when the 
(constant size) time window is moved over the full time span. The correlation coefficients refer to the movement of 
the countries inside this hierarchy. 

In order to exemplify this method, the corresponding steps for S2 = FCE are explicitly shown below (for si = GDP 
the first steps are explicitly described in [15 ). After the (virtual) "average" country is introduced in the system, the 
statistical distances corresponding to the fixed 5 years moving time window can be calculated and set in increasing 
order. The minimal path length (MPL) connections to the "average" country can be established for each country in 
every time interval (Table I). The resulting hierarchy is readily found to be changing from a time interval to another. 
The above procedure is repeated for each macroeconomic indicator, leading to similar three Tables to Table I. 

Next, a time independent "correlation matrix" can be built, at this stage for the country movement fluctuations 
inside the hierarchy, i.e. averaging the relative MLP fluctuations between countries. In so doing, the moving- aver age- 
minimal-path- length (MAMPL) method leads us to a set of M = 4 correlation matrices (one for each index, having 
the size N x TV, where TV = 15 is the number of countries under consideration here. E.g. Table II, for FCE (= S2). 
N.B. the matrix is symmetric (half of the elements are shown) but not all elements are necessarily positive. 

Let us suppose that we filter these (four) second correlation matrices in order to retain a few terms, - those which lead 
us to build a network for which the weights (i.e. the correlation coefficients) are greater than e.g. 0.9. The correlation 
coefficients, e.g. in the case of the variable node S2 (FCE fluctuations), as given in Table II, are emphasized in bold 
for those > 0.9 at each couple of function nodes. Due to the filtering, one can easily see that not all 15 countries 
have at least one "bold node", i.e. are connected through the variable node S2 (FCE), but only nine of them, namely 
AUT, BEL, DEU, ESP, GBR, IRL, NLD, PRT and SWE. In Fig. 1, these nine countries are connected through the 
variable node FCE (the dashed arrows). 



5 




FIG. 2: The factor graph associated to the first 15 EU country connections, according to the strongest correlations in the 
Gross Capital Formation (GCF) and Net Exports (NEX) 



The above procedure can be repeated for GDP, GCF and NEX, whence obtaining the other "clusters" in Figs. 1-2, 
with respectively 9, 8 and 7 countries (for this filter value). Notice that GRC does not belong to any cluster [T5] . 

The cluster contributions to the Hamiltonian can thus be the variable S2. Then one obtains that the cost function 
TL associated to the factor graph (Figs. 1, 2) based on these four variables reads 

TL = TL\ + Ti.2 + Ti-3 +TL4, 

where 

Hi= (LUX)(» 4 ) + (NLD)(s 2 ), 

TL 2 = (ITA)(s ljS4 ) + (AUT)( S2 , S3 ) + (BEL)(3i,s 2 ) + (DNK)( Sl , s 3 ), 

TL 3 = (ESP)(s 2 ,s 3 ) + (FIN)(s 3 ,s 4 ) + + (FRA)(si,s 3 ) + (DEU)(ai, s 2 , s 4 ) + (GBR)(«i, s 2 , s 3 ) + (IRL)(s 1; s 2 , s 4 ), 
Ui= (PRT)( Sl , S2 ,s 3 , S4 ) + (SWE)( Sl , S2 ,s 3 , S4 ), 

from which one could write the equilibrium probability distribution, the partition function and the free energy, 
introduced here above. 



IV. CONCLUSION 



Instead of writing a Hamiltonian as a function of the function nodes, let us project the dynamics of the factor 
graph into a phase space spanned by the variable nodes. Recall that a cluster a was defined as a subset of the factor 
graph such that if a function node belongs to a, then all the variable nodes s a also belong to a. We can write all the 
possible combinations of the four variable nodes and find the Hamiltonian corresponding to function nodes. Let us 
take for example the combination (si = GDP; s 2 = FCE; s 3 = GCF). Then, the function nodes connected only to 
these three variables (not necessarily to all of them) and not to the fourth one (s 4 = NEX) are AUT, BEL, DNK, 
ESP, FRA, GBR and NLD. This means a cluster that we can see in the first row in Table III. The same can be done 
for the other three combinations, leading to another set of clusters. 

In so doing clustering |17j properties appear through e.g. an entropy, Eq. (7). The values are given in Table 
III for the clusters made of three variable nodes. As a not obvious consequence of this cluster analysis technique, 
it is observed that the maximum entropy (0.367) corresponds to the clustering which does not explicitly include 
the GDP but only the consumption, investment and trade components. Another point can be deduced from the 
minimal entropy (0.347) clustering scheme, i.e. it is obtained from the coupling between GDP and FCE. The results 
confirm intuitive economic theory and practice expectations at least as regards geographical connexions. However 
deep discussions of these findings are left for economists. 

In conclusion, let us recall the frame of our work and our findings : relevant microscopic description of a system relies 
on a coarse-grained reduction of its internal variables. We have presented a way to do so for a bipartite graph having 
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TABLE III: Clustering of the first 15 EU countries in a 4-variable factor graph approach after filtering (see text) and projecting 
in a three variable node phase space; the number of links in the cluster, the maximum possible number of links, subsequently 
the relevant ratio, and the entropy of each cluster are given 



variable 
nodes 


cluster of 
function 
nodes 


number 
of 
links 


maximum 
number 
of links 


ratio 


entropy 


GDP- 
-FCE 


-AUT-BEL- 
-DNK-ESP-FRA- 
-VjBxi-JNijJJ- 


14 


28 


0.500 


0.347 


GDP- 
-FCE- 
-NEX 


-BEL-DEU- 
-IRL-ITA- 
-LUX-NLD 


12 


24 


0.500 


0.347 


GDP- 
-GCF- 
-NEX 


-DNK-FIN- 
-FRA- 
-ITA-LUX 


9 


20 


0.450 


0.359 


FCE- 

-GCF- 

-NEX 


-AUT-ESP- 
-FIN- 
-LUX-NLD 


8 


20 


0.400 


0.367 



on one hand countries, on the other hand macro-economy indicators. We have obtained a Hamiltonian description. 
The technique can of course be generalized and applied to many other socio-economy networks. 

V. CONCLUSION 

Complex networks have become an active field of research in physics [18]. These systems are usually composed of a 
large number of internal components (the nodes and links) , and describe a wide variety of systems of high intellectual 
and technological importance. Relevant questions pertain to the characterization of the networks. Investigations 
of the case of directed and/or weighted networks are not so common. The occurrence of community clustering for 
networks having nodes possessing a vector-like characteristics has been rarely studied. We have attempted to do so 
through a revival of some clustering variation method in the framework of some macro-economy study. 

We have taken as an example the weighted fully connected network of the N = 15 first countries forming the 
European Union in 2005 (EU-25) . The ties between countries are supposed to result (be proportional) to the degree 
of similitude of the macroeconomic fluctuations annual rates of four macro-economic indicators, i.e. - Gross Domestic 
Product (GDP), Final Consumption Expenditure (FCE), Gross Capital Formation (GCF) and Net Exports (NEX) 
over ca. a 15 year time span. 

Averaging the yearly increment correlations a weighted bipartite network has been built having the four "degrees 
of freedom" and the fifteen "countries" on the other hand as basis. The analysis shows the importance of A^-body 
interactions in particular when observing the macro-economy states asa function of time. This leads to identify and 
display clusters of countries, -clusters resulting from projections onto a high-dimensional phase space spanned by 
indicators, taken as independent variables. This approach generalizes usual projection methods by accounting for the 
complex geometrical connections resulting from vector-like nodes. 

In particular such a measure of collective habits does fit the usual and practical expectations defined by politicians, 
journalists, or economists, through so called "common factors" [T9l [20] . The analysis reveals geographical connexions 
indeed. It is expected that the technique can be applied to many types of physical and socio-economic networks. 
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